On-line soft error correction in matrix-matrix multiplication
نویسندگان
چکیده
Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Soft errors normally do not interrupt the execution of the affected program, but the affected computation results cannot be trusted any more. A well known technique to correct soft errors in matrix–matrix multiplication is algorithm-based fault tolerance (ABFT). While ABFT achieves much better efficiency than triple modular redundancy (TMR) – a traditional general technique to correct soft eywords: lgorithm-based fault tolerance atrix multiplication ault tolerant linear algebra n-line algorithm based fault tolerance errors, both ABFT and TMR detect errors off-line after the computation is finished. This paper extends the traditional ABFT technique from off-line to on-line so that soft errors in matrix–matrix multiplication can be detected in the middle of the computation during the program execution and higher efficiency can be achieved by correcting the corrupted computations in a timely manner. Experimental results demonstrate that the proposed technique can correct one error every ten seconds with negligible (i.e. less than over 1%) performance penalty
منابع مشابه
A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملAlgorithm-Based Secure and Fault Tolerant Outsourcing of Matrix Computations
We study interactive algorithmic schemes for outsourcing matrix computations on untrusted global computing infrastructures such as clouds or volunteer peer-to-peer platforms. In these schemes the client outsources part of the computation with guaranties on both the inputs’ secrecy and output’s integrity. For the sake of efficiency, thanks to interaction, the number of operations performed by th...
متن کاملError correction in fast matrix multiplication and inverse
We present new algorithms to detect and correct errors in the product of two matrices, or the inverse of a matrix, over an arbitrary field. Our algorithms do not require any additional information or encoding other than the original inputs and the erroneous output. Their running time is softly linear in the number of nonzero entries in these matrices when the number of errors is sufficiently sm...
متن کاملMatrix-Vector Multiplication via Erasure Decoding
The problem of fast evaluation of a matrix-vector product over GF (2) is considered. The problem is reduced to erasure decoding of a linear error-correcting code. A large set of redundant parity check equations for this code is generated. The multiplication algorithm is derived by tracking the execution of the message-passing algorithm on the obtained set of parity check equations. The obtained...
متن کاملAutotuning Gemms for Fermi *
In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial component of numerical software...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Comput. Science
دوره 4 شماره
صفحات -
تاریخ انتشار 2013